Skip to content

BUG: Inconsistent conversion of missing column names #44878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 16, 2021

Conversation

johnzangwill
Copy link
Contributor

Factored out multiple occurances of missing name logic into common method.
DataFrame.to_records() changed to use common method rather than missing value count when filling missing names.
Index.to_frame() left alone for the time being.

import pandas as pd
f = pd.DataFrame([1], index=pd.MultiIndex.from_arrays([[2],[3],[4]], names=[None, "a", None]))

Old behavior:

>>> pd.DataFrame(f.to_records())
   level_0  a  level_1  0
0        2  3        4  1

New behavior:

>>> pd.DataFrame(f.to_records())
   level_0  a  level_2  0
0        2  3        4  1

@johnzangwill johnzangwill marked this pull request as draft December 14, 2021 13:39
@johnzangwill johnzangwill marked this pull request as ready for review December 14, 2021 15:48
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, otherwise lgtm

@@ -91,7 +91,7 @@ def test_to_records_index_name(self):
df.index = MultiIndex.from_tuples([("a", "x"), ("a", "y"), ("b", "z")])
df.index.names = ["A", None]
rs = df.to_records()
assert "level_0" in rs.dtype.fields
assert "level_1" in rs.dtype.fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe add a test checking the whole result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cruel! But done.

@@ -854,6 +854,7 @@ Other
- Bug in :meth:`DataFrame.shift` with ``axis=1`` and ``ExtensionDtype`` columns incorrectly raising when an incompatible ``fill_value`` is passed (:issue:`44564`)
- Bug in :meth:`DataFrame.diff` when passing a NumPy integer object instead of an ``int`` object (:issue:`44572`)
- Bug in :meth:`Series.replace` raising ``ValueError`` when using ``regex=True`` with a :class:`Series` containing ``np.nan`` values (:issue:`43344`)
- Bug in :meth:`DataFrame.to_records` missing names filled incorrectly (:issue:`44818`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you specify a bit more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@johnzangwill johnzangwill requested a review from phofl December 14, 2021 18:59
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, not 100% sold on the location (maybe @jbrockmendel can think of a better one), but ok. ping on green.

@@ -856,6 +856,7 @@ Other
- Bug in :meth:`DataFrame.shift` with ``axis=1`` and ``ExtensionDtype`` columns incorrectly raising when an incompatible ``fill_value`` is passed (:issue:`44564`)
- Bug in :meth:`DataFrame.diff` when passing a NumPy integer object instead of an ``int`` object (:issue:`44572`)
- Bug in :meth:`Series.replace` raising ``ValueError`` when using ``regex=True`` with a :class:`Series` containing ``np.nan`` values (:issue:`43344`)
- Bug in :meth:`DataFrame.to_records` where an incorrect n was used when missing names were replaced by level_n (:issue:`44818`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add backticks around the n and level_n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add backticks around the n and level_n

Done

@@ -604,3 +604,10 @@ def is_builtin_func(arg):
otherwise return the arg
"""
return _builtin_table.get(arg, arg)


def fill_missing_names(names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type in and out here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and added version and more docstring.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Dec 15, 2021
@jreback jreback added this to the 1.4 milestone Dec 15, 2021
@jbrockmendel
Copy link
Member

test location makes sense to me

@pep8speaks
Copy link

pep8speaks commented Dec 16, 2021

Hello @johnzangwill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-12-16 13:28:03 UTC

@johnzangwill
Copy link
Contributor Author

not 100% sold on the location (maybe @jbrockmendel can think of a better one)

I suspect that @jreback was referring to the fact that I factored the logic into core/common.py.
It seemed a good place to me. My method is ref'ed in:

  • core/frame.py
  • core/indexes/multi.py
  • io/pytables.py
  • io/sql.py
  • io/json/_table_schema.py

and core/common is already imported as com into every one.
But if you have a better suggestion...

@johnzangwill johnzangwill requested a review from jreback December 16, 2021 15:21
@jreback jreback merged commit a769e38 into pandas-dev:master Dec 16, 2021
@jreback
Copy link
Contributor

jreback commented Dec 16, 2021

thanks @johnzangwill very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Inconsistent conversion of missing column names
5 participants